On the k-Closest Substring and k-Consensus Pattern Problems

نویسندگان

  • Yishan Jiao
  • Jingyi Xu
  • Ming Li
چکیده

Given a set S = {s1, s2, . . . , sn} of strings each of length m, and an integer L, we study the following two problems. k-Closest Substring problem: find k center strings c1, c2, . . . , ck of length L minimizing d such that for each sj ∈ S, there is a length-L substring tj (closest substring) of sj with min1≤i≤k d(ci, tj) ≤ d. We give a PTAS for this problem, for k = O(1). k-Consensus Pattern problem: find k median strings c1, c2, . . . , ck of length L and a substring tj (consensus pattern) of length L from each sj minimizing the total cost w = n ∑ j=1 min 1≤i≤k d(ci, tj). We give a PTAS for this problem, for k = O(1). Our results improve recent results of [10] and [16] both of which depended on the random linear transformation technique in [16]. As for general k case, we give an alternative and direct proof of the NP-hardness of (2)-approximation of the Hamming radius k-clustering problem, a special case of the k-Closest Substring problem restricted to L = m.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Closest Substring Problems with Small Distances

We study two pattern matching problems that are motivated by applications in computational biology. In the Closest Substring problem k strings s1, . . ., sk are given, and the task is to find a string s of length L such that each string si has a consecutive substring of length L whose distance is at most d from s. We present two algorithms that aim to be efficient for small fixed values of d an...

متن کامل

On The Parameterized Intractability Of Motif Search Problems

We show that Closest Substring, one of the most important problems in the field of consensus string analysis, is W[1]-hard when parameterized by the number k of input strings (and remains so, even over a binary alphabet). This is done by giving a “strongly structure-preserving” reduction from the graph problem Clique to Closest Substring. This problem is therefore unlikely to be solvable in tim...

متن کامل

Parameterized Intractability of Motif Search Problems

We show that Closest Substring, one of the most important problems in the field of biological sequence analysis, is W[1]-hard when parameterized by the number k of input strings (and remains so, even over a binary alphabet). This problem is therefore unlikely to be solvable in time O(f(k) · n) for any function f of k and constant c independent of k. The problem can therefore be expected to be i...

متن کامل

ar X iv : c s . C C / 0 20 50 56 v 1 2 1 M ay 2 00 2 Parameterized Intractability of Motif Search Problems ∗

We show that Closest Substring, one of the most important problems in the field of biological sequence analysis, is W[1]-hard when parameterized by the number k of input strings (and remains so, even over a binary alphabet). This problem is therefore unlikely to be solvable in time O(f(k) · n) for any function f of k and constant c independent of k. The problem can therefore be expected to be i...

متن کامل

Hard problems in similarity searching

The Closest Substring Problem is one of the most important problems in the field of computational biology. It is stated as follows: given a set of t sequences s1; s2; : : : st over an alphabet , and two integers k; d with d k, can one find a string s of length k and, for all i = 1; 2; : : : ; t, substrings oi of si, all of length k, such that d(s; oi) d (for all i = 1; 2; : : : ; t)? (here, d(:...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2004